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Abstract 

This paper derives a population sizing relationship for genetic programming (GP). Follow- 
ing the population-sizing derivation for genetic algorithms in Goldberg, Deb, and Clark (19921, 
it considers building block decision making as a key facet. The analysis yields a GP-unique 
relationship because it has to account for bloat and for the fact that GP solutions often use 
subsolutions multiple times. The population-sizing relationship depends upon tree size, solu- 
tion complexity, problem difficulty and building block expression probability. The relationship 
is used to analyze and empirically investigate population sizing for three model GP problems 
named ORDER, ON-OFF and LOUD. These problems exhibit bloat to differing extents and differ in 
whether their solutions require the use of a building block multiple times. 



1 Introduction 

The growth in application of genetic programming (GP) to problems of practical and scientific 
importance is remarkable QKeijzer, O'Reilly, Lucas, Costa, Soule, 2004| |Riolo fc Worzel, 2003} 
Cantu-Paz, Foster, Deb, Davis, Roy, O'Reilly, Beyer, Standish, Kendall, Wilson, Harman, Wegener, Dasgupta, Pi 
Cantu-Paz, Fost er, Deb, Davis, Roy, O'Reilly, Beyer, Standish, K endall, Wilson, Ha rman, Wegener, D asgupt a, Pi 
Yet, despite this increasing interest and empirical success, GP researchers and practitioners are of- 
ten frustrated — sometimes stymied — by the lack of theory available to guide them in selecting key 
algorithm parameters or to help them explain empirical findings in a systematic manner. For ex- 
ample, GP population sizes run from ten to a million members or more, but at present there is no 
practical guide to knowing when to choose which size. 
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To continue addressing this issue, this paper builds on a previous paper ( Sastry, Q'Reiriy,"G*o ldberg, & Hill, 200c 
wherein we considered the building block supply problem for GP. In this earlier step, we asked what 
population size is required to ensure the presence of all raw building blocks for a given tree size 
(or size distribution) in the initial population. The building-block supply based population size 
is conservative because it does not guarantee the growth in the market share of good substruc- 
tures. That is, while ensuring the building-block supply is important for a selecto-recombinative 
algorithm's success, ensuring a growth in the market share of good building blocks by correctly 
deciding between competing building blocks is also critical (Goldberg, 2002). Furthermore, the 
population sizing for GA success is usually bounded by the population size required for making 
good decisions between competing building blocks. Our results herein show this to be the case, at 
least for the ORDER problem. 

Therefore, the purpose of this paper is to derive a population-sizing model to ensure good 
decision making between competing building blocks. Our analytical approach is similar to that used 
by Goldberg, Deb, and Clark (19 92) for developing a population-sizing model based on decision- 
making for genetic algorithms (GAs). In our population-sizing model, we incorporate factors that 
are common to both GP and GAs, as well as those that are unique to GP. We verify the populations- 
sizing model on three different test problem that span the dimension of building block expression — 
thus, modeling the phenomena of bloat at various degrees. Using ORDER, with UNITATION as its 
fitness function, provides a model problem where, per tree, a building block can be expressed only 
once despite being present multiple times. At the opposite extreme, our LOUD problem models a 
building block being expressed each time it is present in the tree. In between, the 0N-0FF problem 
provides tunability of building block expression. A parameter controls the frequency with which 
a 'function' can suppress the expression of the subtrees below it, thus effecting how frequently a 
tree expresses a building block. This series of experiments not only validates the population-sizing 
relationship, but also empirically illustrates the relationship between population size and problem 
difficulty, solution complexity, bloat and tree structure. 

We proceed as follows: The next section gives a brief overview of past work in developing 
facetwise population-sizing models in both GAs and GP. In Sectional we concisely review the deriva- 
tion by ( Goldber g, Deb, Clark, 1992 ) of a population sizing equation for GAs. Section 0] provides 
GP-equivalent definitions of building blocks, competitions (a.k.a partitions), trials, cardinality and 
building-block size. In Section[5]we follow the logical steps of ( |Goldberg, Deb, fe; Clark, 1992| ) while 
factoring in GP perspectives to derive a general GP population sizing equation. In Section El we 
derive and empirically verify the population sizes for model problems that span the range variable 
BB presence and its expressive probability. Finally, section summarizes the paper and provides 
key conclusions of the study. 



2 Background 



One of the key achievements of GA theory is the identification of the building-block decision 



making to be a statistical one ( Holland, 1973|). Holland (1973) illustrated this using a 2 fc -armed 
bandit model. Based on Holland's work, De Jong (1975 ) proposed equations for the 2-armed bandit 
problem without using Holland's assumption of foresight. He recognized the importance of noise 
in the decision-making process. He also proposed a population-sizing model based on the signal 
and noise characteristics of a problem. |De Jong[ s suggestion went unimplemented till the study by 
Goldberg and Rudnick (1991 ). |Goldberg and Rudnick computed the fitness variance using Walsh 
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analysis and proposed a population-sizing model based on the fitness variance. 

A subsequent work flGoldberg, Deb, Clark, 1992 1 proposed an estimate of the population 



size that controlled decision-making errors. Their model was based on deciding correctly between 
the best and the next best BB in a partition in the presence of noise arising from adjoining BBs. 
This noise is termed as collateral noise (Goldberg & Rudnick, 1991). The model proposed by 



IGoldber ge^dJ^ yielded practical population-sizing bou nds for selectorecombinative GAs. More 
recently |Harik, Cantu-Paz, Goldberg, and Miller (1999| ) refined the population-sizing model pro- 
posed by |Goldberg, Deb, and Clark (1992[ ). Harik e t al.l proposed a tighter bound on the popu- 
lation size required for selectorecombinative GAs. They incorporated both the initial BB supply 
model and the decision-making model in the population-sizing relation. They also eliminated the 
requirement that only a successful decision-making in the first generation results in the convergence 
to the optimum. To eliminate this requirement, they modeled the decision-making in subsequent 



generations using the well known gambler's ruin m odel (|Feller, 1970 ). Miller (1997) extended the 



population-sizing model for noisy environments and Cantu-Paz (2000 ) applied it for parallel GAs. 

While, population-sizing in genetic algorithms has been successfully studied with the help of 
facetwise and dimensional models, similar efforts in genetic programming are still in the early 
stages. Recently, we developed a population sizing model to ensure the presence of all raw building 
blocks in the initial population size. We first derived the exact population size to ensure adequate 
supply for a model problem named ORDER. ORDER has an expression mechanism that models how 
a primitive in GP is expressed depending on its spatial context. We empirically validated our 
supply-driven population size result for ORDER under two different fitness functions: UNITATION 
where each primitive is a building block with uniform fitness contribution, and DECEPTION where 
each of m subgroups, each subgroup consisting of k primitives, has its fitness computed using a 
deceptive trap function. 

After dealing specifically with ORDER in which, per tree, a building block can be expressed at 
most once, we considered the general case of ensuring an adequate building block supply where 
every building block in a tree is always expressed. This is analogous to the instance of a GP problem 
that exhibits no bloat. In this case, the supply equation does not have to account for subtrees that 
are present yet do not contribute to fitness. This supply-based population size equation is: 

n = -2 k K (log n - log e) . (1) 
A 

where k enumerates the partition or building block competition, k is the building-block size, e is 
supply error and A is average tree size. 

In the context of supply, to finally address the reality of bloat, we noted that the combined 
probability of a building block being present in the population and its probability of being expressed 
must be computed and amalgamated into the supply derivation. This would imply that Equation^ 
though conservative under the assumed condition that every raw building block must be present in 
the initial population, is an underestimate in terms of accounting for bloat. Overall, the building 
block supply analysis yielded insight into how two salient properties of GP: building block expression 
and tree structure influence building block supply and thus influence population size. Building block 
expression manifests itself in 'real life' as the phenomena of bloat in GP. Average tree size in GP 
typically increases as a result of the interaction of selection, crossover and program degeneracy. 

As a next step, this study derives a decision-making based population-sizing model. We em- 



ploy the methodology of Goldberg, Deb, and Clark (1992 1 used for deriving a population sizing 
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Figure 1: Two competing building blocks of size k, one is the best BB, H±, and the other is the 
second best BB, B.2- 

relationship for GA. In this method, the population size is chosen so that the population contains 
enough competing building blocks that decisions between two building blocks can be made with 
a pre-specified confidence. Compared to the GA derivation, there are two significant differences. 
First, the collateral noise in fitness, arises from a variable quantity of expressed BBs. Second, the 
number of trials of a BB, rather than one per individual in the GA case, depends on tree structure 
and whether a BB that is present in a tree is expressed. In the GP case, the variable, k related 
to cardinality (e.g. the binary alphabet of a simple GA) and building block defining length, is 
considerably larger because GP problems typically use larger primitive sets. It is incorporated into 
the relationship by considering BB expression and presence. 

Before presenting the decision-making model for GP, we briefly discuss the population-sizing 



model of Goldberg, Deb, and Clark (1992| ) in the following section. 



3 GA Population Sizing from the Perspective of Competing Build- 
ing Blocks 

The derivational foundation for our GP population sizing equation is the 1992 result for the selecto- 
recombinative GA by (Goldberg, Deb, & Clark, 1992) entitled "Genetic Algorithms, Noise and the 



Sizing of Populations" . The paper considers how the GA can derive accurate estimates of BB fitness 
in the presence of detrimental noise. It recognizes that, while selection is the principal decision 
maker, it distinguishes among individuals based on fitness and not by considering BBs. Therefore, 
there is a possibility that an inferior BB gets selected over a better BB in a competition due to 
noisy observed contributions from adjoining BBs that are also engaged in competitions. 

To derive a relation for the probability of deciding correctly between competing BBs, the authors 
considered two individuals, one with the best BB and the other with the second best BB in the 
same competition. (Goldberg, Deb, Clark,~1992). 



Let i\ and 12 be these two individuals with m non-overlapping BBs of size k as shown in figure^ 
Individual i\ has the best BB, H\ (111 • • • 111 in figure^) and individual 12 has the second best BB, 
H2 (000 • • • 000 in figure P). The fitness values of i\ and 12 are Jh x and fn 2 respectively. To derive 
the probability of correct decision making, we have to first recognize that the fitness distribution 
of the individuals containing Hi and H2 is Gaussian since we have assumed an additive fitness 
function and the central limit theorem applies. Two possible fitness distributions of individuals 
containing BBs H\ and H2 are illustrated in figure^ 

The distance between the mean fitness of individuals containing H±, fjj i: and the mean fitness 
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(a) Few samples 



(b) Lots of samples 



Figure 2: Fitness distribution of individuals in the population containing the two competing build- 
ing blocks, the best BB Hi, and the second best BB H2. When two mean fitness distributions 
overlap, low sampling increases the likelihood of estimation error. When sampling around each 
mean fitness is increased, fitness distributions are less likely to be inaccurately estimated. 



of individuals containing H2 , fjj , is the signal, d. That is 

d = 7 Hl -7 H2 - (2) 

Recognize that the probability of correctly deciding between Hi and H2 is equivalent to the 
probability that fn ± — fn 2 > 0- Also, since fu^_ and fjj 2 are normally distributed, fjf 1 — fn 2 is 
also normally distributed with mean d and variance o\ + a\ 2 , where a\ and a\ are the fitness 
variances of individuals containing Hi and H2 respectively. That is, 

f Hl -f H2 ~M(d,a 2 Hl +a 2 H2 ). (3) 

The probability of correct decision making, pd m , is then given by the cumulative density function 
of a unit normal variate which is the signal-to-noise ratio : 

/ '"'"' * I / 2 d 2 • (4) 



Alternatively, the probability of making an error on a single trial of each BB can estimated by 
finding the probability a such that 



d 2 

z 2 (a) = -g — ; — — (5) 



'-Hi H2 

where z(a) is the ordinate of a unit, one-sided normal deviate. Notationally z(a) is shortened to z. 

Now, consider the BB variance, (and a\ ): since it is assumed the fitness function is the 
sum of m independent subfunctions each of size k, a\ (and similarly o# ) is the sum of the variance 
of the adjoining m — 1 subfunctions. Also, since it is assumed that the m partitions are uniformly 
scaled, the variance of each subfunction is equal to the average BB variance, a\ h . Therefore, 

GA BB Variance: a 2 Hl = a 2 U2 = (m — l)cr 2 b . (6) 
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A population-sizing equation was derived from this error probability by recognizing that as the 
number of trials, r, increases, the variance of the fitness is decreased by a factor equal to the trial 
quantity: 

z \a) = . d l \ (7) 

v > (m-l)cr bb v / 

T 

To derive the quantity of trials, r, assume a uniformly random population (of size n). Let % 
represent the cardinality of the alphabet (2 for the GA) and k the building-block size. For any 
individual, the probability of H\ is 1/k where k = x k - There is exactly one instance per individual 
of the competition, = 1. Thus, 

t = n ■ pbb • = n • V K • 1 = n/ K (8) 

By rearrangement and calling z 2 the coefficient c (still a function of a) a fairly general population- 
sizing relation was obtained: 

n = 2c X k {m-l)^ (9) 

To summarize, the decision-making based population sizing model in GAs consists of the following 
factors: 

• Competition complexity, quantified by the total number of competing building blocks, 

x k - 



Subcomponent Complexity, quantified by the number of building blocks, m. 



Ease of decision making, quantified by the signal-to-noise ratio, d/a 2 



• Probabilistic safety factor, quantified by the coefficient c. 

4 GP Definitions for a Population Sizing Derivation 

Most GP implementations reported in the literature use parse trees to represent candidate programs 
in the population (Langdon k, Poli, 2002 ). We have assumed this representation in our analysis. 



To simplify the analysis further, we consider the following: 

1. A primitive set of the GP tree is J-U T where T denotes the set of functions (interior nodes 
to a GP parse tree) and T denotes the set of terminals (leaf nodes in a GP parse tree). 

2. The cardinality of J- = Xf an d the cardinality of T = xt- 

3. The arity of all functions in the primitive set is two: All functions are binary and thus the 
GP parse trees generated from the primitive set are binary. 

We believe that our analysis could be extended to primitive sets containing functions with arity 
greater than two (non-binary trees). We also note that our assumption closely matches a common 
GP benchmark, symbolic regression, which frequently has arithmetic functions of arity two. 

As in our BB supply paper ( Sas try, O'Reilly, Goldberg, fc Hill, 2 003 ) , our analysis adopts a 



definition of a GP schema (or similarity template) called a "tree fragment". A tree fragment is a 
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(a) (b) (c) (d) (e) (f) (g) 

Figure 3: The smallest tree fragments in GP. Fragments (c) and (d) have mirrors where the child is 
2nd parameter of the function. Likewise, fragment (f) has mirror where 1st and 2nd parameters of 
the function are reversed. Recall that a tree fragment is a similarity template: based on the similarity 
it defines, it also defines a competition. A tree fragment, in other words, is a competition. (At other 
times we have also used the term partition interchangeably with tree fragment or competition) 



tree with at least one leaf that is a "don't care" symbol. This "don't care" symbol can be matched 
by any subtree (including degenerate leaf only trees). As before, we are most interested in only the 
small set set of tree fragments that are defined by three or fewer nodes. See Figure |3] for this set. 

The defining length of a tree fragment is the sum of its quantities of function symbols, J-, and 
terminal symbols, T: 

k = N f + N t (10) 

Because a tree fragment is a similarity template, it also represents a competition. Since this paper 
is concerned with decision making, we will therefore use "competition" instead of a "tree fragment" . 
The size of a competition (i.e. how many BBs compete) is 

« = xf * X? (11) 
As mentioned in (Sastry, O'Reilly, Goldberg, & Hill, 2003), because a tree fragment is defined with- 



out any positional anchoring, it can appear multiple times in a single tree. We denote the number 
of instances of a tree fragment that are present in a tree of size A, (a.k.a the quantity of a tree 
fragment in a tree) as eft. This is equivalent to the instances of a competition as <j> is used in the 
GA case (see Equation |SJ) . For full binary trees: 

<P 2- k X (12) 

Later, we will explain how describes potential quantity of per tree" of a BB. 



5 GP Population Sizing based on Decision Making 

We now proceed to derive a GP population sizing relationship based on building block decision 
making. Preliminarily, unless noted, we make the same assumptions as the GA derivation of 
Section 01 

The first way the GP population size derivation diverges from the GA case is how BB fitness 
variance (i.e. a\ x and Ojj ) is estimated (for reference, see Equation EJ) . Recall that for the GA the 
source of a BB's fitness variance was collateral noise from the (m — 1) competitions of its adjoining 
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BBs. In GP, the source of collateral noise is the average number of adjoining BBs present and 
expressed in each tree, denoted as q. Thus: 



GP BB Variance: a H = a H = [(f^ r (m,X) — l]<j'£ b . (13) 



Thus, the probability of making an error on a single trial of the BB can be estimated by finding 
the probability a such that 

d 2 

z2 ( a ) = ^-expr 7TT- ( 14 ) 



2[ff " 1W 6 

The second way the GP population size derivation diverges from the GA case is in how the 
number of trials of a BB is estimated (for reference, see Equation [SJ . As with the GA, for GP we 
assume a uniformly distributed population of size n. In GP the probability of a trial of a particular 
BB must account for it being both present, 1/k, and expressed in an individual (or tree), which we 
denote as V & bb • S°> i n GP: 

t = - ■ -<t>-n (15) 
Thus, the population size relationship for GP is: 



2 

n = 2cl M K [QbB ~ X ] expr: (16) 



d 2 



Pbb 



where c = z 2 (a) is the square of the ordinate of a one-sided standard Gaussian deviate at a specified 
error probability a. For low error values, c can be obtained by the usual approximation for the tail 
of a Gaussian distribution: a exp(— c/2)/(\/2c)- 

Obviously, it is not always possible to factor the real-world problems in the terms of this 
population sizing model. A practical approach would first approximate (f> = 2~ fc (A) trials per tree 
(the full binary tree assumption). Then, estimate the size of the shortest program that will solve 
the problem, (one might regard this as the Kolomogorov complexity of the problem, A^), and choose 
a multiple of this for A in the model. In this case, q = c^m^. To ensure an initial supply of building 
blocks that is sufficient to solve the problem, the initial population should be initialized with trees 
of size A. Therefore, the population sizing in this case can be written as 

_ al (c k m k - 1) 2 fc +! 

Similar to the GA population sizing model, the decision-making based population sizing model 
in GP consists of the following factors: 

• Competition complexity, quantified by the total number of competing building blocks, n. 

• Ease of decision making, quantified by the signal-to-noise ratio, dja 2 h . 

• Probabilistic safety factor, quantified by the coefficient c. 

• Number of subcomponents, which unlike GA population-sizing, depends not only on the 
minimum number of building blocks, required to solve the problem m^, but also tree size A, 
the size of the problem primitive set and how bloat factors into trees, (quantified by V & bb\ 
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Figure 4: A candidate solution for a 4-primitive ORDER problem. The output of the program is 
{Xi, X2, X%} and its fitness is 2. 

6 Sizing Model Problems 

This section derives the components of the population-sizing model (Equation I16|) for three test 
problems, ORDER, LOUD, and ON-OFF. We develop the population-sizing equation for each of theses 
problems and verify them with empirical results. In all experiments we assume that a = 1/m and 
thus derive c. Table H3 shows some of these values. For all empirical experiments the the initial 
population is randomly generated with either full trees or by the ramped half-and-half method. The 
trees were allowed to grow up to a maximum size of 1024 nodes. We used a tournament selection 
with tournament size of 4 in obtaining the empirical results. We used subtree crossover with a 
crossover probability of 1.0 and retained 5% of the best individuals from the previous population. 
A GP run was terminated when either the best individual was obtained or when a predetermined 
number of generations were exceeded. The average number of BBs correctly converged in the best 
individuals were computed over 50 independent runs. The minimum population size required such 
that m — 1 BBs converge to the correct value is determined by a bisection method ( jSastry, 2001 1 . 
That is the error tolerance, a = 1/m. The results of population size and convergence time was 
averaged over 30 such bisection runs, while the results for the number of function evaluations was 
averaged over 1500 independent runs. We start with population sizing for ORDER, where a building 
block can be expressed at most once in a tree. 



m 


8 


16 


32 


64 


128 


c 


.97 


1.76 


2.71 


3.77 


4.89 



Table 1: Values of c = z 2 (a) used in population sizing equation. 



6.1 ORDER: At most one expression per building block per tree 

ORDER is a simple, yet intuitive expression mechanism which makes it amenable to analysis and mod- 
eling flGoldberg fe O'Reilly, 19981 IQ'Reilly & Goldberg, 1998| ). The primitive set of ORDER consists 
of the primitive JOIN of arity two and complimentary primitive pairs (Aj, Aj), i = 0, 1, • • • ,m of 
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arity one. A candidate solution of the ORDER problem is a binary tree with JOIN primitive at the 
internal nodes and either Xj's or AVs at its leaves. The candidate solution's expression is deter- 
mined by parsing the program tree inorder (from left to right). The program expresses the value 
Xi if, during the inorder parse, a X, leaf is encountered before its complement Xi. Furthermore, 
only unique primitives are expressed in ORDER during the inorder parse. 

For each Xi (or Xi) that is expressed, an equal unit of fitness value is accredited. That is, 



fl(Xi 



1 if Xi € {Xx,X 2 ,---,X m } 
otherwise 



(18) 



The fitness function for ORDER is then defined as 



F(x) =£/i(a 



(19) 



i=l 



where x is the set of primitives expressed by the tree. The output for optimal solution of a 2m- 
primitive ORDER problem is {Xi,X 2 , ■ ■ ■ ,X m }, and its fitness value is m. The building blocks in 
ORDER are the primitives, Xi, that are part of the subfunctions that reduce error (alternatively 
improve fitness). The shortest perfect program is Xk = 2m — 1. 

For example, consider a candidate solution for a 4-primitive ORDER problem as shown in fig- 
ure HI The sequence of leaves for the tree is {X\, X%, X%, X4, X\, X 2 }, the expression during 
inorder parse is {Xi, X 2 , X4}, and its fitness is 2. For more details, motivations, and analysis 
of the ORDER problem, the interested reader should refer elsewhere (Goldberg Hz O'Reilly, 1998 
O'Reilly fc Goldberg, 1998 ). 

For the ORDER problem, we can easily see that a\ h = 0.25, d = 1, and < 
we know that 



expr 

Pbb 



cxp 



-k ■ e 



A 

2m 



1. From Sastry, O'Reilly, Goldberg, an 
(20) 



Additionally, for ORDER, q B g is given by 



m— 1 



-expr 

Qbb 



i=0 




lb"'- 1 



(21) 



where, n\ is the average number of leaf nodes per tree in the population. The derivation of the 
above equation was involved and detailed. It is provided in Appendix ^J. 

Substituting the above relations ( Equations I2UI and 12 1|) in the population-sizing model (Equa- 
tion we obtain the following population-sizing equation for ORDER: 



n = 2 k - 1 z 2 (a) 




1] exp 



k ■ e 



X 
' 2m 



(22) 



The above population-sizing equation is verified with empirical results in Figure El The initial 
population was randomly generated with either full trees or by the ramped half-and-half method 
with trees of heights, h £ [hk — 1, hk + 1], where, hk is the minimum tree height with an average of 
2m leaf nodes. 

As shown in Figure El we empirically observed that the convergence time and the number of 
function evaluations scale linearly and cubically with the program size of the most compact solution, 
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Figure 5: Empirical validation of the population-sizing model (Equation I22|) for ORDER problem. 
Tree height hk equals 2 m and A = 2m — 1 = 2 h+1 — 1. 



Afc, respectively. From this empirical observation, we can deduce that the population size for ORDER 
scales quadratically with the program size of the most-compact solution. For ORDER, Afc = 2m — 1. 

To summarize for the ORDER problem, where a building block is expressed at most once per 
individual, the population size scales as n = O (2 k \^}j, the convergence time scales as t c = O (A&), 
and the total number of function evaluations required to obtain the optimal solution scales as 
n fe = 0(2 k \t). 

6.2 LOUD: Every building block in a tree is expressed 

In ORDER, a building block could be expressed at most once in a tree, however, in many GP problems 
a building block can be expressed multiple times in an individual. Indeed, an extreme case is when 
every building block occurrence is expressed. One such problem is a modified version of a test 



problem proposed by Soule (2002) (see also ( Soule &; Heckendorn, 2002| Soule, 2003| )), which we 
call as LOUD. 

In LOUD, the primitive set consists of an "add" function of arity two, and three constant terminal 
0, 1 and 4. The objective is to find an optimal number of fours and ones. That is, for an individual 
with i 4s and j Is, the fitness function is given by 

.F(x) = \i — 7714) + \ j — mi| (23) 

Therefore, even though a zero is expressed it does not contribute to fitness. Furthermore, a 4 or 
1 is expressed each time it appears in an individual and each occurrence contributes to the fitness 
value of the individual. Moreover, the problem size, m = 771,4 + Tn\ andA^ = 2m — 1 . 

For the LOUD problem the building blocks are "4" and "1". It is easy to see that for LOUD, 
a BB = 0.25, d = 1, (j) = A/2, and ^bb = 1/3. Furthermore, the average number of building blocks 
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Figure 6: Empirical results for the convergence time and the total number of function evaluations 
required to obtain the global solution for ORDER problem. Note that A& = 2m — 1 so convergence 
time and the number of function evaluations scale linearly and cubically with the program size of 
the most compact solution or problem difficulty. The implication is that population size for ORDER 
problem is quadratic. 



expressed is given by q B g 
( Equation I16|) we obtain 



2ni/3 ~ A/3. Substituting these values in the population-sizing model 



n 



2-3 k z 2 {a) 



bb 



-A-l 



(24) 



The above population-sizing equation is verified with empirical results in Figure [7J The initial 
population was randomly generated by the ramped half-and-half method with trees of heights, 
h E [2, 7] yielding an average tree size of 4.1 (this value is analytically 4.5). 

We empirically observed that the convergence time was constant with respect to the problem 
size, and the number of function evaluations scales sub-linearly with the program size of the most- 
compact solution, Afc. From this empirical observation, we can deduce that the population size for 
LOUD scales sub-linearly with the program size of the most-compact solution. For LOUD A& = 2m— 1. 

To summarize for the LOUD problem, where a building block is expressed each time it occurs in 
an individual, the population size scales as n = O (3 fc A£- 5 ) , the convergence time is almost constant 
with the problem size, and, and the total number of function evaluations required to obtain the 
optimal solution scales as nj e = O ^3 fc A^ 



6.3 0N-0FF: Tunable building block expression 

In the previous sections we considered two extreme cases, one where a building block could be 
expressed at most once in an individual and the other where every building block occurrence is 
expressed. However, usually in GP problems, some of the building blocks are expressed and others 
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Figure 7: Empirical validation of the population-sizing model (Equation I24[) and empirical results 
for the total number of function evaluations required to obtain the global solution for LOUD problem. 
Convergence time was constant with respect to problem size. Note that A/~ = 2m — 1 so the number 
of function evaluations scales sub-linearly with the program size of the most compact solution or 
problem difficulty. The implication is that population size for LOUD problem is sub- linear. 



are not. For example, a building block in a non-coded segment is neither expressed nor contributes 



to the fitness. Empirically, ( Luke, 2000 a) calculates the percentage of inviable nodes in runs of the 
6 and 11 bit multiplexer problems and symbolic regression over the course of a run. This value 
is seen to vary between problems and change over generations. Therefore, the third test function, 
which we call 0N-0FF, is one in which the probability of a building block being expressed is tunable. 

In 0N-0FF, the primitive set consists of two functions EXP and EXP of arity two and terminal Xi, 
and X2. The function EXP expresses its child nodes, while EXP suppresses its child nodes. Therefore 
a leaf node is expressed only when all its parental nodes have the primitive EXP. This function can 
potentially approximate some bloat scenarios of standard GP problems such as symbolic-regression 
and multiplexer problems where invalidators are responsible for nullifying a building block's effect 
(Luke, 2000a). The probability of expressing a building block can be tuned by controlling the 



frequency of selecting EXP for an internal node in the initial tree. 

Similar to LOUD, the objective for 0N-0FF is to find an optimal number of fours and ones. That 
is, for an individual with i Xis and j X2S, the fitness function is given by 



F(x) 



m Xl \ + \j ~ mx 2 



(25) 



The problem size, m = mx 1 + mi 2 an d = 2m — 1. 

For example, consider a candidate solution for the LOUD problem as shown in figure |H1 The 
terminals that are expressed are {X\ , X\ , X\ , X2 } and the fitness is given by 1 3 — m xi | + 1 1 — m X2 \ . 

For the 0M-0FF problem the building blocks are X\ and X 2 , a 2 BB = 0.25, d = 1, <j) = A/2, 
and PgJ^ = Pexp- Here, pexp is the probability of a node being the primitive EXP. The average 
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number of building blocks expressed is given by = ni • v\xp ~ f ' Pexp- Substituting these 
values in the population-sizing model (Equation IT!))) we obtain 



n = 2 k+1 z 2 (a) 



A h 
i^Pexp 



(26) 



EXP , 



The above population-sizing equation is verified with empirical results in Figure EJ The initial 
population was randomly generated by the ramped half-and-half method with trees of heights, 
h 6 [hk — 1, hk + 1], where is the minimum tree height with an average of m leaf nodes. We 
empirically observed that the convergence time was linear with respect to the problem size, and the 
number of function evaluations scales sub-quadratically with the program size of the most-compact 
solution, Afc. From this empirical observation, we can deduce that the population size for On-Off 
scales sub-linearly with the program size of the most-compact solution (A& = 2m — 1). 

To summarize for the On-Of f problem, where a building block expression is tunable, the popula- 
tion size scales as n = 0(2 k \° k - 5 /p exp ), the convergence time scales linearly as t c — O (^2 k X^/p eX p^, 
and the total number of function evaluations required to obtain the optimal solution scales as 
n fe = O h k \l 5 /p 2 exp 



7 Conclusions 

This contribution is a second step towards a reliable and accurate model for sizing genetic program- 
ming populations. In the first step the model estimated the minimum population size required to 
ensure that every building block was present with a given certainty in the initial population. We 
accepted this conservative model (i.e. it oversized the population) because in the process of deriving 
it, we gained valuable insight into a) what makes GP different from a GA in the sizing context and 
b) the implications of these differences. The difference of GP's larger alphabet, while influential in 
implying GP needs larger population sizes, was not a difficult factor to handle while bloat and the 
variable length individuals in GP are more complicated. 
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Figure 9: Empirical validation of the population-sizing model (Equation and empirical results 
for the total number of function evaluations required to obtain the global solution for On-Off 
problem. Convergence time was constant with respect to problem size. Note that A& = 2m — 1. 
The convergence time scales linearly C(Afc), and the number of function evaluations scales sub- 
quadratically O (Al.5) with the program size of the most compact solution or problem difficulty. 
Therefore, the population size for On-Off problem scales sub-linearly O (A^.5). 



Moving to the second step, by considering a decision making model (which is less conservative 
than the BB supply model), we extended the GA decision making model along these dimensions: 
first, our model retains a term describing collateral noise from competing BBs (q[m, A]) but it 
recognizes that the quantity of these competitors depends on tree size and the likelihood that the 
BB is present and expresses itself (rather than behave as an intron). Second, our model, like its 
GA counterpart, assumes that trials decrease BB fitness variance, however, what was simple in a 
GA - there is one trial per population member, for the GP case is more involved. That is, the 
probability that a BB is present in a population member depends both on the likelihood that it is 
present in lieu of another BB and expresses itself, plus the number of potential trials any BB has 
in each population member. 

The model shows that, to ensure correct decision making within an error tolerance, population 
size must go up as the probability of error decreases, noise increases, alphabet cardinality increases, 
the signal-to-noise ratio decreases and tree size decreases and bloat frequency increases. This 
obviously matches intuition. There is an interesting critical trade-off with tree size with respect to 
determining population size: pressure for larger trees comes from the need to express all correct 
BBs in the solution while pressure for smaller trees comes from the need to reduce collateral noise 
from competing BBs. 

The model is conservative because "it assumes that decisions are made irrevocably during any 
given generation. It sizes the population to ensure that the correct decision is made on average in 
a single generation" (G oldberg, 2002| ). In this way, it is similar to the Schema Theorem. A more 
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accurate and different model would account for how correct decision making accumulates over the 
course of a run, and how, over the course of a run, improper decision making can be rectified. 

The fact that the model is based on statistical decision making means that crossover does not 
have to be incorporated. In GAs crossover solely acts as a mixer or combiner of BBs. Interestingly, 
in GP, crossover also interacts with selection with the potential result that programs' size grows and 
structure changes. When this happens, the frequency of bloat can also change (see ( Luke, 20 00a 
Luke, 2000b) for examples of this with multiplexer and symbolic regression). These changes in size, 



structure and bloat frequency imply a much more complex model if one were to attempt to account 
for decision making throughout a run. They also suggest that when using the model as a rule of 
thumb to size an initial population it may prove more accurate if the practitioner overestimates 
bloat in anticipation of subsequent tree growth causing more than the bloat seen in the initial 
population, given its average tree size. 

It appears difficult to use this model with real problems where, among the GP particular 
factors, the most compact solution and BB size is not known and the extent of bloat can not be 
estimated. In the case of the GA model, the estimation of model factors has been addressed by 
( Reed, Minsker, fc G oldberg, 2000). They estimated variance with the standard deviation of the 
fitness of a large random population. In the GP case, this sampling population should be controlled 
for average tree size. If a practitioner were willing to work with crude estimates of bloat, BB size 
and most compact solution size, a multiple of the size of the most compact solution could be plugged 
in, and bloat could be used with that size to estimate the probability that a BB is expressed and 
present and the average number of BBs of the same size present and expressed, on average, in 
each tree. In the future we intend to experiment with the model and well known toy GP problems 
(e.g. multiplexer, symbolic regression) where bloat frequency and most compact problem size are 
obtainable, and simple choices for BB size exist to see whether the ideal population size scales with 
problem size within the order of complexity the model predicts. 

Population sizing has been important to GAs and is now important to GP, because it is the 
principle factor in controlling ultimate solution quality. Once the quality-size relation is understood, 
populations can be sized to obtain a desired quality and only two things can happen in empirical 
trials. The quality goal can be equaled or exceeded in which case, all is well with the design of the 
algorithm, or (as is more likely) the quality target can be missed, in which case there is some other 
obstacle to be overcome in the algorithm design. Moreover, once population size is understood in 
this way it can be combined with an understanding of run duration (citation), thereby yielding first 
estimates of GP run complexity, a key milestone in making our understanding of these processes 
more rigorous. 
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A Derivation of the Average Number of Expressed Building Blocks 
for the ORDER Problem 



The following derivation provides expression for the average number of expressed building blocks 
(BBs) (best or second best) in other partitions, given that a best BB or second best BB is already 
expressed in a particular partition. For example, I assume that either X\ or X\ is expressed in a 
tree. Therefore the total number of leaf nodes available to potential express other BBs is n\ — 1. 

Given that the problem has m building blocks, the total number of terminals, xt = 2m (Recall 
that the terminal set, T = {X\, Xi, X2, X2, • • • , X m , X m }). Therefore, the total possible terminal 
sequences, given n\ — 1 leaf nodes, N tot , is 

AT tot = (2m) ni - 1 . (27) 

The number of building blocks that expressed in n\ — 1 nodes vary from to m — 1 (note that 
we assume that one building block is already expressed). That is, if either X\ or X\ are present in 
the remaining n\ — \ leaf nodes, the number of expressed building blocks other than X\ or X\ is 
zero. Similarly if there is at least one copy of one of the m — 1 complementary primitives present in 
n\ — 1 leaf nodes, then the number of BBs expressed other than X\ or X\ is m — 1. For brevity, in 
the reminder of this report, the number of expressed BBs refer to only the BBs expressed in ni — 1 
leaf nodes. 

Before proceeding with the derivation itself, we develop few identities that will be used through- 
out the derivation. 




where a > 2 is an integer. 
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Here again, a > 2 is an integer. 

Number of expressed BBs = 0. The number of ways either X\ or X\ is present in n; — 1 nodes 
is 



7lj-l 



(nj - 1)! 
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= 2 ni ~ 1 



nz - 1 



(33) 

(34) 
(35) 



Number of expressed BBs = 1. Here the terminals that can be present in the nz — 1 nodes are 
X\ or X\ or exactly one of the other complementary pairs. Therefore, we begin by counting 
the number of ways of having at least one copy of either X2 or X2 in nz — 1 nodes. In other 
words, we count the number of ways in which only X2 or its complement, X2 can be expressed. 



TV (Terminals present = X\ or X\ or X2 or X2) 
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E E E 
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In arriving at Equation 0U] from Equation [3U we use the identity given by Equation 1281 

Note that we chose X2 (or equivalently its complement, X2) as an example. In fact there are 

(in — 1 \ 
I alternatives to choose from. Therefore, the total number of ways in which only 

one BB gets expressed in n\ — 1 nodes is given by 

/ m — 1 
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Number of expressed BBs = 2. Here the terminals that can be present in the n\ — 1 nodes are 
X\ or X\ or exactly two other complementary pairs. Therefore, we begin by counting the 
number of ways of having at least one copy of either X2 or X2 and at least one copy of either 
X% or X3 in i%i — 1 nodes. In other words, we count the number of ways in which only X2 or 
its complement, X2, and X3 or its complement -X3 can be expressed. 



N (Terminals present = X\ or X\ or X2 or X2 or X% or X3) 
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The second summation removes the extra counting of the case when neither X2 or its com- 
plement, X2 are present in the n\ — 1 nodes. In other words, it ensures the presence of at 
least one copy of either X2 or X.2- 



N (Terminals present = X\ or X\ or X2 or X2 or X3 or X3 
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Consider the sum 
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The second last step in the above derivation uses the identity given by Equation 1501 
Now consider the sum 
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Using Equations E3 and ESI we get 

N (Terminals present = X\ or X% or X2 or X2 or X3 or X3) = Si — S2 
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Note that we chose X2 and X3 (or equivalently their complement, X2 and X3) as an example. 
m — 1 



In fact there are 



alternative pairs to choose from. Therefore, the total number of 



ways in which only one BB gets expressed in n\ — 1 nodes is given by 
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N (Terminals present = X\ or X\ or X2 or X2 or X3 or 



-(m-l)(m-2) 



gn,-l _ 2 . 4^i~l _|_ 2 Ul ~ l 



(61) 



Number of expressed BBs = 3. Here the terminals that can be present in the ni — 1 nodes are 
X\ or X\ or exactly three other complementary pairs. Therefore, we begin by counting the 
number of ways of having at least one copy of either X2 or X2, at least one copy of either X3 
or X3, and at least one copy of either X4 or X4 in m — 1 nodes. In other words, we count 
the number of ways in which only X2 or its complement, X2, X3 or its complement X3, X4 
or its complement X4 can be expressed. 

N (Terminals present = X\ or X\ or X2 or X2 or X3 or X3 or X4 or X4) 

ra ; -4ra ; -4-j n;-3— j-kni-3-j— fc-qrnj— 2-j-k-q-r 

= E E E E E 

jr=0 fc=0 g=0 r=0 s=0 
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ni—2—j—k—q—r—sni—l—j—k—q—r—s—t 

E E 

t=0 u=0 



(ni-l)\ 



j\k\q\r\s\t\u\ (n; — 1 — j — k — q — r — s — t — u)l 

(rn-iy. 



4 n; —4—j rii—3—j—k ni—3—j—k—q ni — l—j~k—q~r 

T T T T T 

■ n , n n n n ?'!£;!g!r !d (nj - 1 - 7 - A; - q - r - u) ! 

j=0 fc=0 q=0 r=0 «=0 J ^ ' 

7»l-4f» ( -4-j ni-2-j-kni-2-j-k-sni-l-j-k-s-t , 

fro h h jMsW.ulim-l-j-k-s-t-uy. 

n ( -4n ( -4-j ni-l-j-k 



+ E E E 



(n,-l)! 



j!/c!u! (n; — 1 — j — A; — u)\ 



(62) 



j=0 fc=0 u=0 

The above equation can be rewritten as 

N (Terminals present = X\ or X\ or X2 or X2 or X3 or X3 or or X4) 

nj-4 

E 



E 



\ ( nj-l-J ni-l-j-k 



j=o \ J / fc=0 
nj-3-j-fc-g 

E 

r=0 

ni—2—j—k—q—r—s / 

E 

t=0 \ 
ni—l—j—k—q—r—s—t 

E 

u=0 



A: 



ni-3-j-k 

E 

g=0 



3 



nj-l-j-fc-gX y- / ni-l-j-k-q-r 

) h V 

ni — 1— j — k — q — r — s \ 

* ) 

I ni — 1— j — k — q — r — s — t 
\ u 



ri[— 4 

E 



nj-4-j 

E 



1 ( ni-1- j \ ^ I ni — l — j — k 



k=0 



ni-3-j-k 

E 

q=0 



ni-3-j-k-q / . , \ ni-l-j-k-q-r / . , 

ni-l-j-k-q\ y-^ ni-l-j-k-q-r 



E 

r=0 



E 

u=0 



It 



«;-4 / -. \ ni-4-j / . \ n t -2-j-k / . , 

'n;-l \ ( raj - 1 - j \ ^ I n x - 1 - j - k 



E 

j=Q 



E 

k=0 



E 

s=0 



ni-2-j-k-s / i-i \ ni-l-j-k-s-t / . 

/ ni — 1— j — k — s \ ^ / n[ — \ — j — k — s — t 



E 

t=0 



E 

u=0 



+ 



«;-4 / -. \ n ( -4-j / . \ ni-l-j-k / . , 

'n;-l \ [ "i-l-J 1 ( ni-l-j-k 



E 



E 

fc=0 



E 

u=0 



(63) 



Consider the sum 

n;-4 

^ - E 

j=0 



. ni-4-j / . \ ni-l-j-k / . . 

1 ( ni — \ — j \ >r-^ m-1 - j - k 



E 

k=0 



E 

u=0 
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which can be written as 

"■1-4: / \ ri[-4-j / 

s 4 = e f ] E ( ni ~ ' 



3=0 
ni-4 

= E 

3=0 

t 

ni-4 

= E 

3=0 
ni-1 

= E 

3=0 
ni-1 

= E 

3=0 



3 

m - l 

3 



k=0 \ 

m-i-j 

E 

k=0 



k 



(64) 



ni-l-j 
k 



2«;-l-j-fc I _ I 



-2 (raj - 1 - j) - 2 (raj - 1 - j) (raj - 2 - j)] 



(65) 



raj - 1 
3 

raj - 1 
3 

raj - 1 



3 n«-i-J _ i _ 2 ( n , - 1 - j) - 2 (raj - 1 - j) (raj - 2 - j)l (66) 



3 n«-i-i _ i _ 2 ( n , - 1 - j) - 2 (raj - 1 - j) (raj - 2 - j)l (67) 



3 n«-i-i _ ( 2 ( n , - l) ( n , - 2) + 2raj - 1) + 4 (n, - 1) j - 2j 2 l (68) 



= 4"'- 1 - [2 (raj - 1) (raj - 2) + 2raj - 1] T^ 1 + 4 (raj - l) 2 2 ni " 2 - 2raj (raj - 1) 2™ 1 (6 3 9) 
= 4 n«-i _ - n , (n, - 1) 2 n '" 2 (70) 



S 3 



"i-4 / 1 \ nj-4-j / . \ ni-2-j-k / . 



/ rii — 1 — i — k — s \ v-^ / ni — \ — i — k — s — t 

E ( t E 

t=0 \ / u=0 \ 



n ( -4 

E 



ni-4-j 

E 



1 ( ni — 1 — j \ >r-^ ni -1- j - k 



,j=o \ J 

ni —2—j—k—s 

E 



fc=0 



A: 



ni-2-j-k 

E 

s=0 



t=0 



ni-l-j-k-s j 2n; _i_j_fe_ s _t 



«i-4 / -. \ ni-4-j / 

= El"'" 1 ) E J 



ni-2-j-k 

E 

s=0 



J 



/ fc=0 \ 

raj - 1 - j - k 

s 



k 



ni-4 / -. \ n;-4-j / 

El"'" 1 ) E f"'" 1 - 3 



n ; -4 

E 

3=0 



k=0 



k 



gni-l-j-k-s _ ^ 



raj - 1 



(71) 



(72) 



(73) 
(74) 



5 n,-i-j _ 3 n,-i-j _ 2 {n l -l-j)-Q(n l -l- j) (raj - 2 - j)l(75) 
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ni-1 

E 

3=0 



ni-1 



5 n,-i-j _ 3 n,-i-j _ 2 (nj _ ! _ i} _ 6 (n/ _ j _ (nj _ 2 _ J -)l( 76 ) 



nj— 1 



graj-1 _ 4 n ; -l _ ^ 



- 1 



J=0 



6 (n, - l) 2 - 4 (n, - 1) - 12 (n, - 1) j + 4j + 6#j) 



6 n,-i _ 4 n«-i _ 6 ( n/ _ i) 2 2"'- 1 + 4 (nj - 1) 2 ni ~ 1 + 12 (nj - l) 2 2 n <~ 2 

- 4 (nj - 1) 2 n '" 2 - 6n ; (nj - 1) 2 ni " 3 (78) 
gn,-! _ 4 n,-i + 2 ( n/ _ !) 2™'" 1 _ 3n ; (n, - 1) 2 ni " 2 (79) 



n ; -4 

E 

,i=o \ J 

ni-3-j-k-q 



E 

fc=0 



raj - 1 ^ / raj — 1 — j ^ ( ni-1- j - k 

k 



ni-3-j-k 

E 

g=0 



. , \ ni-l-j-k-q-r / . , 

y- \ni-\-j-k-q\ /nj-l-j-fc-g-r 
r =o V r / «=o V 



(80) 



nj-4 / \ ni-i-j / . \ ni-3-j-k / . . 

^ / raj - 1 \ f raj - 1 - j \ ni-l-j-k 

h \ j ) k\ k ) h \ « 



ni-3-j-k-q 

E 

r=0 



ni-l-j-k-q 
r 



2^1 — 1— j—k—q-r 



(81) 



«;-4 / -. \ n ( -4-j / 

) E I"' -1- ' 



j=0 ^ 

rit~3-j-k 

E 

9=0 



J 



/ fc=0 \ 

ni-1- j - k 

s 



k 



Tl -i-j-k-q _ l _ 2 [ ni -i- j -k-q) 



(82) 



«;-4 / -. \ n ( -4-j / 

§h )£( n, ~^ 

+2 (n, - 1 - j - k) 2 ni ' 2 - j - k 

m-4, / \ ni-i-j / 

ei"'" 1 ) e r J 



J 



fc 



i=0 \ J / Ai=0 \ 
m-4 / n \ (ni-l-j / 

S(V){£("' 13 

-2(n,-l-j) (nj-2-j)} 



4 n,-i-j-fc _ ( 2n/ _ i _ 2 j - 2fc) 2 n '- 1 -- ? '- fe 

(83) 

4 n,-i-i-fc _ ( nj _ j _ fc ) 2"'- 1 -^- fe l (84) 



fc 



4 n,-l-j-fc _ ( nj _ j _ fc) 2 n '- 1 -J- fc 



(85) 
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n ; -4 

E 

j=0 



ni-1 



5 n,-l-j _ ( nj _ 3 n,-l-j + ( ni _ j _ ^ 3 n ; -2-i 



-2 (n, - 1 - j) (n, - 2 - j)] 



ni—l 

E 



ni-1 



-2 (m - 1 - j) (n, - 2 - j)] 



6 n(_1 - n^™ 1 " 1 + - {m - 1) 4"^ 1 - 2 (n, - 1) (nj - 2) 2 Tti ~ 1 + 

2(2„,-3)Ef"'- 1 ^-^E 1 f'"7 1 V 2 + fE 



j=0 



i=o 



2^/^-1 
3 M i 



6"'- 1 - ni\ n ^ x + -(ni- 1) - 2 (nj - 1) (nj - 2) 2"'- 1 + 
2 (2n, - 3) (n, - 1) 2 n '~ 2 - 2n t (m - 1) 2 n '~ 3 + | («i - 1) 4"'- 2 

- ^ (n, + 1) + (n, - 1) 2 n ^ 1 - m (n, - 1) 2 n <~ 2 



E 



E 



ni-l \ ( n * _ 1 _ J \ I ni-l-j - k 



ni-3-j-k 

E 

g=0 



3 



-- S ^r fc - 9 /n,-l-j-Jfc-g\ " 2 '' ' (m-l-j-k 

h \ r ) h V 



ri[—2—j—k—q—r—s / 

E 

t=0 \ 

n; — 1— j— fe— g— r— s— f 

E 

M=0 



ni-l-j-k-q-r-s \ 

* J 

I n\ — 1 — j — k — q — r — s — t 
\ u 



"1-4 / t \ ni-4-j / . \ ni-3-j-k / . 

V I n? I V ( n l~ l ~ 3 \ V ni-l-j-k 

h\ i ) k\ k ) h V 9 



' * ' ? / n, - 1 - J - k - q ) " 2 " " (m-l-j-k 

h \ r ) h v 



n;— 2— j— fc— q— r— s 

E 

t=0 



n; — 1— j — — g — r — s 
4 



^ni—l—j—k—q—r—s—t 



raj— 4 

E 



-4-j 

E 



\ / - 1 - j \ ni-l-j-k 



ni-3-j-k 

E 

9=0 
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ni-3-j-k-q / i • r \ ni-2-j-k-q-r / 11 

I ni — 1 — j — k — q \ s—^ / ni — 1 — j — k — q — r 



E 

r=0 

jgn ( -l-.j'-fc-(j-r-.s _ j j 



E 

s=0 



(93) 



jij-4 

E 

j=0 



n t -4-j 

E 

fc=0 



1 I ni — 1 — j \ ^ I ni — 1 — j — k 



n t -3-j-k 

E 

q=0 



ni—3—j—k—q / 1 ■ r \ ni—l—j—k—q—r / , . , 

ni-l-]-k-q\ ^ ni-l-j-k-q-r 



E 

r=0 



E 

s=0 



jg^-l-j-fc-q-r-s _ j j 



ni - 1 \ / ni — 1 — j \ ni-l-j-k 



J 



n,-4 

E 

j=0 
ni-3-j-k-q 

E 

r=0 



E 

fc=0 



A: 



ni-3-j-k 

E 

9=0 



ni-l-j-k-q 
r 



1 ) h v 



{4— —-2——} 

\ ni-3-j-k r 



E 

j=0 
ni-l-j-k-q 

E 

r=0 \ 

-2 (n, - 1 - j - A; - q)] 



k 



E 

9=0 



ni-l-j-k 

q 



ni-l-j-k-q 
r 



{4— —2——} 



«i-4 / -. \ n ( -4-j / . \ ni-3-j-k / . 

' nz - 1 \ ( — 1 — j \ ^ / ni - 1 - j - k 



E 

3=0 



E 

fc=0 



E 

<7=0 



^nt-l-j-k-q _ ^-l-j-k-q _ 2 ( n , _ 1 _ j _ £ - 9) } 



n;-4 

E 

3=0 



ni -4.-3 

E 

fc=0 



raj - 1 \ J / n , - 1 - j \ '" '7 ni-l-j-k 



m-i-j-k 

E 

g=0 



ni-4 / \ rii-4-jr' / 

'n;-l\ ( ni-l-j 



E 

j=0 



E 

fc=0 



jgni-l-j-A: _ ^ni-l-j-k 



-2 (n, - 1 - j - k) 2 n '~ 1 ~i~ k + 2{ ni -\-j-k) 2 n '- 2 ^'- fe } 



m~4 / -, \ ni-4-j / 



E 

j=0 



E 

fc=0 



(m-i-j-k) r^- k } 



(94) 



(95) 



(96) 



(97) 



(98) 



(99) 



(100) 
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n ; -4 

£ 

3=0 



ni-l \ I ni - 1 - j 



(n, - 1 - J - fe) 2 n ^- k ] - 6 (n, - 1 - j) (n, - 2 - j)" 



(101) 



n '~ 4 / _i \ 

3=0 V 3 J 

+ {m - 1 - j) 3 n; - 2 ~ J - 6 (n, - 1 - j) (ni - 2 - j)} 



(102) 



n;-4 / 

£ "7 1 

i=o V J 

-6(n,-l- j)(n,-2-j)} 



Tm-i-j _ 5 n,-i-j _ _ ( ni _ j _ ^ 3 n,-i-j 
3 



ni—l / -, 

E "'7 

i=o \ J 
-6(n,-l- j)(n,-2-j)} 



ynz-l-j _ gnj-l-j _ _ ^ _ x _ ^ 3 n,-l-j 
3 



(103) 



(104) 



= gn,-l-j _ gnj-l-j _ 2 ^ _ ^ 4n ,_i + 2 ^ _ ^ 4 „ ; _ 2 _ g ^ _ ^ ^ _ ^ 

3 3 

+ 6 (2ni - 3) (n, - 1) 2 ni " 2 - 6n, (n, - 1) 2 n; " 3 (105) 
= S^- 1 ^ - 6 ni ' x ~ j - - (nj - 1) 4™'" 1 + 3 (n, - 1) 2 ni " 1 - 3n t (nj - 1) 2"^ 2 (106) 

5i = 8™^ - O"'- 1 + 4 n; " 1 - i (n { + 1) 4™'- 1 + 3 (n, - 1) 2™'- 1 - 3n, ( nj - 1) 2 n '~ 2 (107) 
Using Equations 1701 - ITUT1 we get 

N (Terminals present = X\ or X\ or X2 or X2 or or X3 or X4 or X4) = S\ — S2 — S3 + S4 

1 



gm-i _ 6n,-i + 4 m-i - - (n, + 1) 4™'- 1 + 3 (nj - 1) 2 ni " 1 
e"'- 1 - ~ (n, + 1) 4" 1 " 1 + (n, - 1) 2"'- 1 



+ 



8 n ; 



-1 



3n, (n / -l)2"'- 2 

ni (n / -l)2"'" 2 
gn,-i _ 4 n,-l + 2 ( n , - 1) 2™'- 1 - 3n, (n, - 1) 2 n '~ 2 
4 n,-l _ 2 n,-l _ n/ ( n; _ i)2^- 2 " 

3-6 ni - 1 + 3-4 n! - 1 -2-2 ni - 1 . 



(108) 
(109) 



Note that we chose X2, X3, and X4 (or equivalently their complement, X2, X3, and X4) as 

(Tfi 1 \ 

^ I alternative pairs to choose from. Therefore, the 

total number of ways in which only one BB gets expressed in m — 1 nodes is given by 



N (n^ B = 3) 



m — 1 



N (Terminals = X x or X x or X 2 or X 2 or X 3 or X 3 or X A or(MD)) 
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m — 1 



8 n ^ 1 -3-6 n; ~ 1 + 3-4 ni - 1 -2 n 



-l 



(111) 



From the above cases we can generalize the number of ways of expressing i BBs in ni — 1 nodes is 
given by 

N {n^ B = i) = ( m ~ 1 j £ ( j ) (-iy [2 (» - i + l)]*" 1 (112) 

Recall that the total number of ways of arranging the 2m terminals in n\ — 1 nodes is given by 

N tot = (2m) n '- 1 
Therefore, the probability of expressing i BBs is given by 

I exp -\ N i n BB = *) / 110 \ 

P( n BB = = ^ ( 113 ) 

JVtot 



m — 1 



3=0 



The average number of expressed building blocks other than the one that decision is being made 



on 



m— 1 



m-1 \ .A / i\, <w /i-7 + l\ ni_1 



n BB 



t=0 \ / j=0 \ J / x 



The variance in the number of expressed building blocks other than the one that decision is 
being made on 

«5ss - !' ( "7 1 ) *t ( * ) (-i) 3 ' (^ ±i )"" - ra 2 a«» 

B Estimating Tree Sizes 

We start with defining two utility procedures that generate a non-full tree and full tree respectively. 
We have named them accordingly and they correspond in common GP parlance to GROW and 
FULL. These procedures are called by RAMPED-FULL, RAMPED-GROW and RAMPED-HALF- 
HALF. 

Both algorithms are parameterized by: 

• maxHeight : the maximum allowable height of the tree 

• q: the probability with which the terminal set is used to draw a new tree node 

Often q is implicitly set as the frequency of terminal nodes in the primitive set and GPr's simply set 
maxHeight. However, sometimes (like we do in the ORDER problem) a bias between functions and 
terminals is introduced. We note that Luke (Luke, 2000c) has similar versions of these algorithms 
without q explicitized. 
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1 Algorithm I: create-tree-not-necessarily-full (q, maxHeight) 

2 // create trees of more than 1 node 

3 root = get-f unctionO 

4 height = 1 

5 left-child = create-subtree (q, maxHeight-1 , height) 

6 right-child = create-subtree (q, maxHeight- 1 , height) 

7 return make-tree (root , left-child, right-child) 

8 

9 procedure create-subtree (q, maxHeight, current -height) 

10 if current -height = (maxHeight-1) 

11 then 

12 return get-terminal () 

13 else 

14 if rand (0,1) < q then 

15 return get-terminal () 

16 else 

17 return create-tree-not-necessarily-full (q, maxHeight-1) 



The create-tree-not-necessarily-full algorithm creates a GP tree of height between 2 and max- 
Height, not allowing a single leaf to be generated as a tree. The tree is not necessarily full. After 
drawing a function for the tree's root node, it uses q to decide between making each child subtree of 
the root a function or a terminal, except when the tree's height is equal to (maxHeight — 1). When 
the tree's height is equal to maxHeight — 1, a terminal is alway generated as the child subtree. 
This ensures that no tree has height greater than maxHeight. 



1 Algorithm II: create-tree-full (q, maxHeight) 

2 // create full trees of more than 1 node 

3 root = get-f unctionO 

4 height = 1 

5 left-child = create-f ull-subtree (q, maxHeight-1, height) 

6 right-child = create-tree-full (q, height (left-child) ) 

7 return make-tree (root , left-child, right-child) 

8 

9 procedure create-f ull-subtree (q, maxHeight, current -height) 

10 if current -height = (maxHeight- 1) 

11 then 

12 return get-terminal () 

13 else 

14 if rand(0,l) < q then 

15 return get-terminal () 

16 else 

17 return create-tree-full (q, maxHeight-1) 



The create-tree-full algorithm creates a GP tree of height between 2 and maxHeight, not allowing 
a single leaf to be generated as a tree. The tree is always full. After drawing a function for the 
tree's root node, it uses q to decide between making the left child subtree of the root a function 
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or a terminal, except when the tree's height is equal to (maxH eight — 1). When the tree's height 
is equal to maxH eight — 1, a terminal is alway generated as the child subtree. This ensures that 
no tree has height greater than maxH eight. The right child subtree of the root is generated by 
calling create-tree-full with the maxHeight parameter taking the value of the height of the left child 
subtree. (UM: But I haven't checked my pseudocode carefully at all) 

Usually these procedures are subsumed by procedures that create an initial population with 
random fitness but predetermined expected GP tree structure. The procedures are: 

• ramped full. Create subsamples of trees for each height, h, between height 1 and maxHeight. 
Each subsample has full trees of height up to h. 

• ramped not-necessarily-full. Create subsamples of trees for each height, h, between 1 and 
maxHeight. Each subsample has not-necessarily full trees of height up to h. 

• ramped half-half (implying half full and half not necessarily full) . Create two subsamples for 
each height, h, between height 1 and maxHeight. One subsample has full trees of height up 
to h and one subsample has not-necessarily full trees of height up to h. 

Assuming any of these algorithms is executed to generate a tree of size s, because the tree is 
binary, the following is known,: 

1. the number of leaves (terminals), t s = 

2. the number of internal nodes (functions), f s = 

The average size of a tree created using Algorithm create-tree-not-necessarily-full can be esti- 
mated as follows: 

1. a tree of size h has a range of possible sizes from s mm = 2h + 1 to s max = 

2 h+i _ L This 

range is s m m, ^min ~i~ 2, ... , <s max . 

2. the probability of a tree of size s given it has height h and h < /i max , where /i max is maxHeight: 

p {s\h; h < h max ) = (1 - q)f (117) 

3. the probability of a tree of h < h max : 

^max 1 S = S ma x 

p(h<h max )= E P( h \ a ) ( 118 ) 

h=l s=s min by2 

4. the average size of a tree of height h: 

S = S max 

s(h) = J2 p( s \ h ) s ( 119 ) 

s=Smi n by2 

5. the average size of trees of height h < h max 

s(h; h < h max ) = J2 *(h)\\p(h\s)\\ (120) 
h=l 
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6. the estimated average size of a tree of height, h = h max can be estimated conservatively 
(underestimation) : 

s (h = h max ) = (121) 

7. the probability of a tree of height = maxHeight: 

p(h = /i max ) = l-p(h< h nmx ) (122) 

8. the estimated average size of any tree: 

s=p(h = /i max ) s(h = /i max ) + p(h< h nmx ) s(h< /i max ) (123) 

The average size of a tree created using Algorithm create-tree-full can be estimated as follows: 

1. the probability of a tree of height h when h < maxHeight: 

p(h) = (1 - q) h - l q (124) 

2. the probability of any tree of height, h, that is less than the maxHeight, /i max : 

hmax 1 

p{h<h max ) = P( h )i 
h=l 

hmax 1 

h=l 

= 1 - (1 - q)*™*- 1 . 

3. the probability of a tree of h = h max : 

p(h = h max ) = l-p(h< h max ) , 

= (l-g) hm " _1 . (125) 

4. the size of a tree of height h, s(h) = 2 h+l — 1 

5. the average size of a tree of height, h < h max : 

s(h<h nmx ) = \\p( h )W s ( h )> 
h=l 

hmax 1 



1 



1 - (1 - g^max-l 



E [(2 h+1 -l)a(l-q)(h-l) 



4q 



2q-l 



h=i 

hmax 1 



l-(2(l-g)) 

1 - (1 - g)^max-l 



1. (126) 



6. the average size of a tree of height, h = h max , 



s(h = h max ) = \\p(h = h max ) \\s (h = /i max ) , 

= 2 ftmax+1 - 1. (127) 
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7. the average size of a tree s is 



s(h = h max ) p(h = /i max ) + s (h< h max ) p(h < h n 



2' t niax 

iq 



+1 



2q - 1 



(1 - qf™ 

l-(2(l-g)) 
1 - (1 - g)>Wx-l 



+ 

^max 1 



1 - (1 - 9) 



^raax 1 



2g - 2 • [2(1 - g)] femax + 1 
2g- 1 



(128) 



The average size of a tree created using ramped full, ramped not-full or ramped half-half can 
now be easily calculated. I have done this but don't have time to write out the derivation here! (I 
feel a bit like Fermat ;0) 

Hence, given a q, maxHeight and GP tree initialization algorithm, using the equations about, 
we can derive an estimate of average GP tree size, I. 
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